Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip metric regularization if adapt window=0 #3037

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

nhuurre
Copy link
Contributor

@nhuurre nhuurre commented Apr 12, 2021

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary

This came up in #3027 (comment)

CmdStan accepts window=0 as a valid argument even though it means the adaptation cannot possibly estimate the inverse metric. The intended effect appears to be adapt only stepsize and leave the metric at its initial value, either the default identity or a user specified value. That works but only if you also set init_buffer=0. If init_buffer>0 the sampler tries to update the metric anyway and messes it up.

The metric update code is

estimator_.sample_variance(var);
double n = static_cast<double>(estimator_.num_samples());
var = (n / (n + 5.0)) * var
+ 1e-3 * (5.0 / (n + 5.0)) * Eigen::VectorXd::Ones(var.size());

The Welford estimator's sample_variance() is a no-op if there are not enough samples so that part's fine.
The problem is the subsequent regularization which in case of zero samples just discards the (unchanged) metric.
The fix is to simply skip the regularization when sample_variance() didn't do anything.

Intended Effect

Setting window=0 always keeps the initial metric.

How to Verify

Run the examples/bernoulli model with adapt init_buffer=10 window=0 and look for the mass matrix in output.csv.
On develop it says

# Diagonal elements of inverse mass matrix:
# 0.001

On this branch

# Diagonal elements of inverse mass matrix:
# 1

Side Effects

Documentation

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Niko Huurre

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.36 3.43 0.98 -2.09% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -4.56% slower
eight_schools/eight_schools.stan 0.12 0.12 0.97 -2.87% slower
gp_regr/gp_regr.stan 0.16 0.16 0.99 -0.69% slower
irt_2pl/irt_2pl.stan 6.09 5.99 1.02 1.52% faster
performance.compilation 92.0 88.87 1.04 3.4% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.66 8.64 1.0 0.23% faster
pkpd/one_comp_mm_elim_abs.stan 29.3 29.42 1.0 -0.42% slower
sir/sir.stan 130.24 119.52 1.09 8.23% faster
gp_regr/gen_gp_data.stan 0.03 0.03 0.99 -0.81% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.01 3.02 1.0 -0.35% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.41 0.39 1.05 4.51% faster
arK/arK.stan 1.88 1.84 1.02 2.01% faster
arma/arma.stan 0.76 0.86 0.88 -13.04% slower
garch/garch.stan 0.56 0.57 0.99 -1.38% slower
Mean result: 0.997801547886

Jenkins Console Log
Blue Ocean
Commit hash: 3830f6f


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@betanalpha
Copy link
Contributor

Although I largely agree with the update technically inverse metric updating is skipped for both adapt_window = 0 and adapt_window = 1. I'll change the pull name before merging.

covar = (n / (n + 5.0)) * covar
+ 1e-3 * (5.0 / (n + 5.0))
* Eigen::MatrixXd::Identity(covar.rows(), covar.cols());
if (estimator_.num_samples() > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this behavior probably won't be obvious I think we need to wire a message to the logger. Both here and in var_adaptation can you add a std::string& msg argument which passes the message "Warning: inverse metric is not updated for window sizes less than 2". Then we can pass that to the logger in all of the transition implementations.

@@ -27,3 +27,32 @@ TEST(McmcCovarAdaptation, learn_covariance) {
}
EXPECT_EQ(0, logger.call_count());
}

TEST(McmcCovarAdaptation, learn_covariance_one_sample) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corresponding test in var_adaptation_test.cpp?

@betanalpha
Copy link
Contributor

If changes are made according to #3027 (comment) then no warning message would be needed for the base_window = 0 case, in which case messages wouldn't have to be passed, but if inverse metric updating is also disabled for base_window = 1 then I think we still need the message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants